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Abstract. - An important problem in the analysis of experimental data showing fractal prop- 
erties, is that such samples are composed by a set of points limited by an upper and a lower cut 
off. We study how finite size effect due to the discreteness of the sets may influence self similar 
properties even far from these cut-offs. Estimations of these effects are provided on the basis of 
the characteristics of the samples. In particular we present an estimate of the length scale above 
which information about average quantities is reliable, by explicitly computing discreteness 
effects in number counting . The results have particular importance in the statistical analysis 
of the distribution of galaxies. 



The simplest way to characterise a fractal structure is to analyse the scaling behaviour on 
large sets of data, of convenient averaged quantities, such as the two point correlation function 
[Q, H . Here, we introduce a more complete description of self similar structures by including the 
properties of the fluctuations of such quantities with respect to the average values. Previous 
approaches to this problem include the analysis of the void distribution || |j and analysis of 
the tree points correlation function [0 . 

In order to introduce a theory of the fluctuations in fractal patterns, it is crucial to 
characterize the finite size effects Q. Real data are sets of a finite number of points 
where fluctuation analysis requires much larger samples than those needed to compute the 
averaged quantities. In particular in the field of large scale structure astrophysics, due to 
observational limitations, unaveraged quantities are widely used in the statistical analysis of 
galaxy correlations pi ^, |9| . 

In this work we further develop fluctuation analysis and give quantitative criteria to define 
its statistical significance for a given data set. In order to proceed with this analysis we 
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distinguish between finite size effects related to the shape of the data set boundaries and 
the finite size effects due to the discreteness of the sample. In order to avoid the former 
we carefully select the region for number counting. To approach the latter we compute the 
probability distribution of fluctuations in number counts on a discrete set of points. 

The scaling quantity we will consider is the conditional average number mass N(r) defined 
below. Given a set of N tot points, let rii(r) be the Conditional number Mass (CM) from the 
point i defined as follows: 

<r)= ! ^(x-x^x, (1) 

where j runs over all the points in the set and S(r, i) is the d-dimensional hypersphere with 
radius r centred at the point i. N{r) at a given r is defined as the average of the CM's over 
all the points whose distance from the sample boundaries f, is greater than r: 

N(r) = (ni(r)) fi>r . (2) 

The supplementary condition is needed in order to avoid the finite size effects linked to the 
geometrical shape of the sample as pointed out in Ref.s 0]. With this restriction we now 
can face the problem of the discreteness of the data set. 
For a fractal set of dimension D, N(r) takes the form 0: 

N(r) = Br D , (3) 

where B is a prefactor independent from D. In fig. 1, we compare the behaviour of a single CM 
to N(r) for one of the fractal sets described below. The multiplicative character of fluctuations 
is reflected in their constant amplitude in the log-log plot. In order to compare fluctuations 
at different scales we introduce the normalized conditional number: 

to) = jrtf- ( 4 ) 

For N(r) 3> 1 it is possible to introduce a continuous representation for the variable /, whose 
associated probability distribution V(f) is independent from r due to the self-similarity. On 
the other hand one expects the continuous limit does not to hold at a scale close to the lower 
cut-off. This is because in such a range the CM can take only a small set of discrete values 
corresponding to the presence of 0,1,2... points inside the <i-dimensional hypersphere. For 
this reason, it is possible to observe fluctuations not scale invariant at small scales even if 
the averaged quantities show the correct scaling behaviour. The effect of the discreteness on 
fluctuations is particularly evident when fractal correlations are present at distances much 
smaller than the typical distance between neighbor points. Samples with this property are 
equivalent to sets of randomly picked points from much larger fractals. 

Generally, the probability II(n, r) of find n points inside a d-dimensional hypersphere of 
radius r centered on a point of the set, is the convolution of the scale invariant distribution 
density V(f) with a Poisson distribution P m (n) with average m = fN(r) — fBr D : 

oo 

II(n,r) = / dfV(f)P fN(r) (n) - (P fN(r) (n)) , (5) 



" 



where the notation (4>(f))j = J dfV(f)4>(f) stands for the average with respect to the 

o 

distribution function V(f). We define also {<j>(n)) n = J2^=o -P/Af(r)( n )0( n ) f° r the average 
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with respect to the Poisson distribution. The variance of the number of points n is then: 

oo 

a 2 (r) = J2 H(n, r)n 2 - N 2 {r) = ((n\) f N 2 (r) , (6) 

n=0 

where in the last equality we exchanged the sum with the integral. Inserting eq. in eq. 
(||), adding and subtracting the quantity (^(fN(r)) 2 ^ wc obtain: 

a\r) = «n 2 - f 2 N 2 (r)) n ) f + (f 2 N 2 (r) - iV 2 (r)) / . (7) 

This gives: 

a 2 (r)=N(r) + N 2 (r)a 2 si , (8) 

where cr 2 j is the variance of scale invariant distribution V(f). In the first term of eq. 0, the 
inner average is the variance of poisson distribution, i.e. fN(r), and the outer average with 
respect / will give N(r). 

This expression gives the relation between the variance of the number of points in a discrete 
fractal set with respect the intrinsic variance of the fractal probability distribution V(f). The 
variance of the fluctuations is then: 

o, N a 2 (r) 9 1 

where the effect of the random sampling is simply to add the Poisson contribution l/N(r) 
to the scale invariant term a 2 { . Through the expression in cq.(^) it is possible to evaluate 
cr 2 j at any scale. Moreover we can estimate the scale beyond which the poisson fluctuations 
are negligibile with respect the intrinsic ones. In order to have the Poisson term i.e. 10 times 
smaller than the intrinsic one, the CM should be fitted for r larger than the minimal statistical 
length A: 

»-(£)*■ 

Eq. ([h]) with cr^ = 1 has been derived in a phenomenological way in the context of large 
scale structure of the Universe [[ll], ^| . The quantity ex 2 (r) has been called lacunarity Q , and 
is usually referred to as one of the intrinsic characterisation of a fractal set. However the scale 
invariant value a 2 si appear only in the r ^> A region. 

Having defined this quantity, it is possible to further characterise the fractal set. This is 
done by noticing that fluctuations show a characteristic "correlation length" on a log — log 
scale. This corresponds to a correlation scale difference AS above which fluctuations are 
uncorrelated and can be used as an estimate of limiting "length" that has to be considered in 
order to deal with reliable data. 

The correlation function between fluctuations at scales r\ and T2 is: 

S(r 2 , n) = ((/(r 2 ) - l)(/(n) - 1)) = (6f(r 2 )Sf(n)) . (11) 

In the self similar regime one expects S to be a function of the ratio of the scales, i.e. 
S(r<2,ri) — S(\og 10 (r 2 /ri)) = 5(A). An estimation for 5(A) may be found in the following 
way: let ni(r±) the CM from the point i up to a scale r±, ni(r 2 ) the CM up to a scale r 2 > T\ 
and rii {r\ , r 2 ) the number of points in the shell r 2 — n . We can express Sfi (r 2 ) in the following 
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way: 

"i(n) + n(r 1 ,r 2 ) Nfa) _ _ 
N(n) ' N(r 2 ) 

= */i(ri) + J/(ri,r 2 ) . 

(12) 

Assuming that the number of points in the r 2 > r\ shell, Uj(ri, r 2 ) is uncorrelated with m(ri), 
we have that (^5fi(ri)5fi(ri, r 2 )^ = 0. Therefore: 

5(A) = ^/(n) J)) (g) D =o?-io-A , (13) 

where A c = £) -1 . In the general case (Sfi(ri)Sfi^ > and then eq. ( |l3| ) is the lower limit. 

To show the applicability of the formulae derived we will analyse selected fractal samples. 
Those fractal sets are obtained picking randomly points from larger samples generated by a 
random /3 model (RBM) algorithm |ll| in two dimensions. The original sets contain from 10 9 
to 10 10 points, while the diluted samples contain around 10 7 points, in a square domain of 
side 2. 

Two characteristic lengths for the sets are respectively the minimal possible distance be- 
tween points e (which corresponds also to the minimal distance at which one observes fractal 
correlations) , and the average distance to the nearest point ro , which is related to the prefactor 
B through ~ Whilst the two lengths may differ substantially for highly diluted 

samples, they are almost identical for the RBM without dilution. 

In fig 1, we show N(r) for a RBM fractal with D = 0.5, e - 10~ 15 , r ~ KT 14 . The dotted 
line is the theoretical behaviour for N(r) in this sample and the open circles are the values for 
the CM from a randomly chosen point. 

In fig. 2 is reported the quantity <rj(r) for two sets of data: the fractal with D = 0.5 and 
another one with D = 1.8 (e - 10~ 5 and r - 10~ 4 ). In eq.@ we take as independent variable 
the N(r); in this way it is possible to fit the data to eq. (|J) with a single parameter cr^ (solid 
lines). The A^ _1 (r) scaling is the Poissonian term and the plateau gives the intrinsic value 
a 2 si . At large N(r) deviations from the theoretical behaviour are observed. The reason for 
that may be understood through the following argument. All the points belonging to a cluster 
of size r c have correlated rii{r) for r > r c . Sample independent behaviour of fluctuations is 
obtained for r ~ r c as long as a large number of independent clusters of size r c is contained 
in the sample volume. This condition is less and less fulfilled as r increases. This gives rise 
to a strong reduction of Ofir) whilst approaching the sample size. In the insert, we show the 
distributions of the fluctuations in the scale invariant regions, V(f), for the two samples. 

In fig. 3 we show the probability distribution for the fluctuations / at various r, for 
the D — 1.8 fractal. Continuous lines refer to r in the range [10 -3 -10 -2 ] where the good 
superposition of the different curves signals the scale invariance of the fluctuations. Outside 
from this region the measured distribution deviates from the intrinsic shape. The histogram 
of fig 3 is the probability distribution at smaller scales (~ 10~ 4 ), out from the scale invariant 
regime . The main features of it are the discreteness of the allowed value for / and the larger 
variance due to the Poisson noise as explained before. 

Fig 4 shows the measure of correlation between fluctuations S{r\/r2) at different scales. 
The points refer to three fractal sets with respectively fractal dimension D = 0.5, 1, 1.8 and 
are the average of S(r 2 /ri) for all r% and r 2 inside the scale invariant range. While for small 



5fi(r 2 ) = Hr 2 )-l = 



- 1 = 



N(r 2 ) 

n(n,r 2 ) 
N(r 2 ) 
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fractal dimension, D = 0.5, the values estimated from eq. ( |i3|) are in rather good agreement 
with the data, the sets with D closer to the dimension of the embedding space (D — 1.8) show 
higher additional correlations between fluctuations at different scales. According to eq. ( |l3| ) 
we may estimate the correlation scale difference to be A c ~ D^ 1 for small fractal dimension. 
On the other hand for large dimensions, A c appears not to decrease below 1. An estimation 
for the correlation scale difference in the two dimensional RBM is: 

A c = Max (l, 1) . (14) 



D t 

Recovering the average behaviour of N(r) from the measure of single rii{r) requires then a 
range of scale larger then A c . 

The main result is that this analysis allows one to specify the value of A c . Thereby 
introduces a measure for a quantity which has been customary to define as "large enough" |Q . 
For instance, the fractal with dimension D = 0.5 has A c = 2 and in fig. 1 the corresponding 
CM shows typical length of fluctuations of 2 — 3 orders of magnitude. 

A direct application of such results is in the study of statistical properties of galaxy 
distribution. Here, a fundamental measure is the the counts of galaxies versus the distance 
from the Earth, or galaxy CM, n.E(r). The interpretation of the behaviour of galaxy CM is 
crucial in the debate about the existence or not of the homogeneity scale of galaxy distribution 
in the Universe, which is one of the pillars of standard Big Bang theory. On the other side, 
the amount of available data are not so large, making important to check for finite size effects. 

In conclusion, the intrinsic distribution of the fluctuations V(f) can be measured only for 
scales greater than the minimal statistical length, defined in eq. (|To|). On the other hand, 
the lacunarity of a discrete fractal may be evaluated, at all scales, as c?(r). In those cases 
in which the CM has to be used in extracting fractal parameters from experiments, data for 
r < A have to be discarded and a range of scale larger than A c is needed. 
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Fig. 1. - The solid line is the logarithmic conditional average number of points as function of 
the logarithmic distance for a random (3 model with dimension D — 2 and 10 7 points. Open 
circles are the counts from one point picked randomly. The dotted line is the theoretical power 
law expected. The small deviations from the power law at small and large scales are due to 
different kinds of finite size effects. 
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Fig. 2. - Reduced variance vs. the average number of points, logarithmic scale. Squares refer 
to a RBM with D — 1.8 and circles to one with D — 0.5. The solid and the dashed lines are 
fitted to eq. (0). In the inset are shown the numerical V(f) for the two sets above. 
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Fig. 3. - Solid lines refer to the numerical probability density of the fluctuation at several 
different scales within the range of self similarity (D = 1.8). The histogram is the same 
quantity at smaller scales, in the regime where the discretization of / and the Poisson noise 
are much more relevant. 
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Fig. 4. - Correlation between fluctuation versus the scale difference. Circles, squares and 
diamonds refer to fractals with D = 0.5, 1, 1.8 respectively. Lines are fitted to eq. (|l3|). 



